IJCT

Analysis of Artificial Intelligence Speech Recognition Technology

Ye Na Lee, Ui Jeng Kang, Eun Jung Choi

Seoul Women’s University

621, Hwarang-ro, Nowon-gu, Seoul, Republic of Korea
Seoul, Republic of Korea

[email protected], [email protected], [email protected]

Abstract

As artificial intelligence (AI) evolves, it is being used with many technologies. Language and voice recognition were the starting point of all new machine learning capabilities that have come out in recent years. In this paper, we introduce trend of AI personal assistant service and speech recognition technology and propose future direction of development.

Keywords-component; AI, speech recognization technology

1. Introduction

AI means a program that explores solutions through machine learning through thinking, learning and judging like a human being. [1] AI conversed various area. One of the areas where these achievements were most prominent was AI for speech recognition. According to Gartner, it is expected to grow to $2.1billion by 2020. So, many companies are trying to preoccupy the AI speaker market. In May, at the Google I/O Conference, Google introduced Duplex that can make phone bookings on behalf of users. AI tricked our ears into thinking a robot is human. We introduce trends of AI personal assistant service and speech recognition technology and presents future direction of development.

2. Service using speech recognition

2.1. AI personal assistant

AI personal assistant is software that provides services through voice or text conversation with the user. It can analyze the dialogue with the user and extract the intent of the context to provide the personalized service by processing the information.

goal-oriented spoken dialogue systems have been the most prominent component in today’s AI personal assistants.

Table 1. AI personal assistant

Services

Google Assistant

Apple siri

Bixby

Amazon Alexa

Features

- continued conversation,

- work with more than 5,000 smart home devices

- 38 Languages

- Delete specific recording

- User profiling for voice input processing

- Understand context

- Vision service

Work with 12,000 smart home devices like Ring video

- Order and manage shopping list

- Delete specific recording

Home IoT

Google Home

Homepod

Samsung Home IoT

Amazon Echo Show & Spot

The new Google Assistant feature will share a summary of a positive news story when a user prompts it with the simple phrase, “Tell me something good.” Also, Alexa expanded from what it learned about user’s voice so that it could grasp even visual information. Most AI personal assistants have been simple results-displaying programs, but Google and Amazon assistant have been enhanced.

2.2. AI speech recognition technology

Watson from IBM can improve the accuracy by setting important items such as product name and related topics as keywords and provides Text to Speech function. The Google Speech API identifies and translates text in up to four languages into multiple languages, recognizes and annotates multi-channels. Kaldi is a popular open source that is free of charge on GitHub.

Table 2. AI speech recognition technology

Services	IBM Watson	Google Speech API	Microsoft Bing Speech API	Dialogflow API	CMU Sphnix	Kaldi
Languages	9	120	32	14	7	-
Features	- Use selected keyword - Text to Speech	- Multilingual identification and text conversion of up to 4 languages - recognition multi-channel	- Using LUIS, extract intent and entities in text - Text to Speech	- support to wearable, mobile, smart-car, speaker	- Due to low resource requirements can be used on mobile - GitHub	- Integration with finite state transducers - Open licnese
price					free	free

Recently, speech recognition technology provides a function to distinguish various noises, add punctuation when converting text, and to divide the subject of each utterance in conversation.

2.3 Security Risk of AI Assistants

AI speakers have many security risks. For example, AI speakers perform commands from unauthorized users or other devices. And If you use AI Assistant on IoT device, you have many Wi-Fi vulnerabilities. The biggest concern is privacy. Usually, many companies, except Google and Alexa, don’t delete voice recording.[5]

3. Conclusion

Recently, AI has developed from descriptive analytics to cognitive analytics. AI assistant technology, like Duplex, becomes similar human being. A problem of Identification of people and machines must be solved. AI assistants easily access and collect our information. This Services may have other vulnerabilities too. As is often the case, whenever a communication advancement like voice recognition starts to go mainstream, criminals looking to take advantage of it aren’t far behind. [2]

Acknowledgment

This research was supported by the MISP (Ministry of Science, ICT & Future Planning), Korea, under the National program for Excellence in SW (2016-0-00022) supervised by the IITP (Institute for Information & communications Technology Promotion)

References

[1] G.W Lee, “Geo-Spatial Information System.”, Goomibook, 2016

[2] John Markoff, “As Artificial Intelligence Evolves So Does Its Criminal Potential”, The New York Times, October 2016

[3] Hansen, John & Hasan, Taufiq. “Speaker Recognition by Machines and Humans: A tutorial review”, Signal Processing Magazine, IEEE. 32. 74-99. 10.1109/MSP.2015.2462851, November 2015.

[4] MYERS, Karen, et al. An intelligent personal assistant for task and time management. AI Magazine, 2007

[5] Candid Wueest, “A guide to the security of voice-activated smart speakers”, An ISRT Special Report., Symantec, November 2017.